Efficient Hardware Realization of Convolutional Neural Networks using Intra-Kernel Regular Pruning
نویسندگان
چکیده
The recent trend toward increasingly deep convolutional neural networks (CNNs) leads to a higher demand of computational power and memory storage. Consequently, the deployment of CNNs in hardware has become more challenging. In this paper, we propose an Intra-Kernel Regular (IKR) pruning scheme to reduce the size and computational complexity of the CNNs by removing redundant weights at a fine-grained level. Unlike other pruning methods such as Fine-Grained pruning, IKR pruning maintains regular kernel structures that are exploitable in a hardware accelerator. Experimental results demonstrate up to 10× parameter reduction and 7× computational reduction at a cost of less than 1% degradation in accuracy versus the un-pruned case.
منابع مشابه
Coarse Pruning of Convolutional Neural Networks with Random Masks
The learning capability of a neural network improves with increasing depth at higher computational costs. Wider layers with dense kernel connectivity patterns further increase this cost and may hinder real-time inference. We propose feature map and kernel pruning for reducing the computational complexity of a deep convolutional neural network. Due to coarse nature, these pruning granularities c...
متن کاملExploring the Regularity of Sparse Structure in Convolutional Neural Networks
Sparsity helps reduce the computational complexity of deep neural networks by skipping zeros. Taking advantage of sparsity is listed as a high priority in the next generation DNN accelerators such as TPU[1]. The structure of sparsity, i.e., the granularity of pruning, affects the efficiency of hardware accelerator design as well as the prediction accuracy. Coarse-grained pruning brings more reg...
متن کاملPruning Convolutional Neural Networks for Resource Efficient Inference
We propose a new formulation for pruning convolutional kernels in neural networks to enable efficient inference. We interleave greedy criteria-based pruning with finetuning by backpropagation—a computationally efficient procedure that maintains good generalization in the pruned network. We propose a new criterion based on Taylor expansion that approximates the change in the cost function induce...
متن کاملEscort: Efficient Sparse Convolutional Neural Networks on GPUs
Deep neural networks have achieved remarkable accuracy in many artificial intelligence applications, e.g. computer vision, at the cost of a large number of parameters and high computational complexity. Weight pruning can compress DNN models by removing redundant parameters in the networks, but it brings sparsity in the weight matrix, and therefore makes the computation inefficient on GPUs. Alth...
متن کاملPruning Convolutional Neural Networks for Resource Efficient Transfer Learning
We propose a new framework for pruning convolutional kernels in neural networks to enable efficient inference, focusing on transfer learning where large and potentially unwieldy pretrained networks are adapted to specialized tasks. We interleave greedy criteria-based pruning with fine-tuning by backpropagation—a computationally efficient procedure that maintains good generalization in the prune...
متن کامل